feat: block Pruning #2984

pthmas · 2026-01-16T09:37:48Z

Overview

Adds a height-based pruning mechanism to the ev-node store:
- Prunes /h/{height}, /d/{height}, /c/{height}, /i/{hash} up to a target height.
- Also prunes per-height DA metadata (height → DA height mapping) for pruned heights.
- Leaves /s/{height} (state), /t (current height), and global metadata keys untouched.
- Tracks progress via a last-pruned-block-height metadata key so pruning is monotonic and idempotent.
Integrates pruning into block production:
- After a block is committed, at configurable intervals, the executor computes a prune target and calls store.PruneBlocks synchronously.
Wires pruning into the go-header store:
- Sets the go-header pruning window to PruningKeepRecent * BlockTime so header/data retention matches ev-node’s window.
Adds pruning for EVM execution metadata:
- EVMStore exposes PruneExecMeta to delete per-height ExecMeta entries up to a target height.
- The EVM executor implements an optional ExecMetaPruner interface, and the block executor calls it from the same pruning hook.
- Non-EVM (e.g. ABCI) executors don’t implement this interface, so they are unaffected.

Config

New node config fields / flags:

--evnode.node.pruning_enabled
--evnode.node.pruning_keep_recent
--evnode.node.pruning_interval

Pruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).

Design trade-offs

Runs in the critical path:
Pruning is called synchronously in produceBlock. On pruning heights, block production will incur extra latency proportional to the amount of data pruned and underlying storage performance.
Best-effort pruning:
If pruning fails, we log the error but do not fail the block. This avoids halting the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.
Scope:
This PR prunes ev-node’s own store, the go-header store, and EVM ExecMeta. Pruning of ABCI state (ev-abci) is out of scope and will be added in a follow-up once ev-abci exposes its own pruning API.

github-actions · 2026-01-16T09:38:04Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Jan 26, 2026, 3:01 PM

claude · 2026-01-16T09:38:05Z

Claude finished @pthmas's task in 2m 5s —— View job

Code Review: Block Pruning Implementation

Review Checklist

Read all changed files
Analyze code quality & architecture
Review error handling & resilience
Check security considerations
Assess performance & resource efficiency
Verify testing & reliability
Review documentation

Summary

This PR implements height-based pruning for ev-node's block data store. The implementation is generally well-designed with good idempotent tracking, comprehensive testing, and thoughtful integration across multiple stores. However, there are important concerns about running pruning synchronously in the critical path and some edge cases that need attention.

1. Code Quality & Architecture

✅ Strengths

Clean separation of concerns: Pruning is isolated in the Pruner interface (pkg/store/types.go:85-94), making it optional and composable
Consistent patterns: EVM execution metadata pruning mirrors the block store pruning design
Optional interface pattern: The ExecMetaPruner interface (core/execution/execution.go:165-176) allows graceful degradation for non-EVM executors
Good use of metadata keys: Progress tracking via LastPrunedBlockHeightKey and lastPrunedExecMetaKey enables idempotent operations

⚠️ Areas for Improvement

1. Critical Path Synchronous Execution

The pruning runs synchronously during block production (block/internal/executing/executor.go:451-473):

if e.config.Node.PruningEnabled && e.config.Node.PruningKeepRecent > 0 && e.config.Node.PruningInterval > 0 {
    if newHeight%e.config.Node.PruningInterval == 0 {
        // ... pruning code runs synchronously here
    }
}

Concern: Large pruning operations (deleting hundreds/thousands of heights) will block block production, potentially causing:

Missed block production windows
Timeouts in consensus
Poor UX during initial catch-up after enabling pruning

Recommendation: Consider one of:

Background goroutine: Move pruning to a background worker that runs asynchronously
Rate limiting: Prune a small batch per block (e.g., 10 heights per invocation) to bound latency
Separate pruning loop: Run pruning on a separate timer/trigger independent of block production

2. Missing Concurrency Safety

store.PruneBlocks() (pkg/store/store.go:283-368) creates a new batch and iterates through heights. There's no explicit locking to prevent concurrent pruning calls if future refactoring enables async pruning or multiple goroutines.

Recommendation: Add a mutex or atomic flag to prevent concurrent pruning operations, even if current code is sequential.

3. go-header Pruning Window Calculation

In pkg/sync/sync_service.go:261-267:

if syncService.conf.Node.PruningEnabled && syncService.conf.Node.PruningKeepRecent > 0 {
    pruningWindow := syncService.conf.Node.BlockTime.Duration * time.Duration(syncService.conf.Node.PruningKeepRecent)
    if pruningWindow > 0 {
        syncOpts = append(syncOpts, goheadersync.WithPruningWindow(pruningWindow))
    }
}

This assumes BlockTime is constant and accurate. If blocks are produced faster/slower than configured (lazy mode, network delays), the time-based window may not align with the height-based ev-node pruning.

Recommendation: Document this assumption or add validation to warn if actual block times diverge significantly from configured times.

2. Error Handling & Resilience

✅ Strengths

Best-effort pruning: Failures are logged but don't halt block production (executor.go:458, 468)
Idempotent design: PruneBlocks and PruneExecMeta check lastPruned and skip already-pruned ranges
Graceful handling of missing data: errors.Is(err, ds.ErrNotFound) checks allow pruning to continue even if some entries are already gone

⚠️ Potential Issues

1. Silent Failures Leading to Unbounded Disk Growth

The current design logs errors but continues:

if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
    e.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
}

Concern: If pruning consistently fails (e.g., disk full, permissions issue), operators won't notice until disk fills up. There's no metric, alert, or health check for pruning failures.

Recommendation:

Add a Prometheus metric for pruning failures
Add a health endpoint that exposes last successful prune time
Consider degrading node health status if pruning fails repeatedly

2. Partial Batch Commit Failures

PruneBlocks builds a large batch and commits once (store.go:363-365). If commit fails partway through, the lastPruned marker won't update, causing the next prune to retry the entire range.

Recommendation: This is acceptable for now, but document that large pruning ranges after enabling pruning may have retry overhead. Consider chunking batches in the future.

3. Context Cancellation Not Checked in Loop

The pruning loops in PruneBlocks (store.go:309-356) and PruneExecMeta (evm/store.go:173-180) don't check ctx.Done(). If the node is shutting down during a long pruning operation, it won't cancel gracefully.

Recommendation: Add select { case <-ctx.Done(): return ctx.Err(); default: } inside the pruning loops.

3. Security

✅ No Major Concerns

Pruning only affects historical data, not consensus-critical state
Proper input validation on config params (config.go:317-328)
No user input directly flows into pruning operations

⚠️ Minor Considerations

1. DoS via Aggressive Pruning Config

A misconfigured node with PruningInterval=1 and PruningKeepRecent=1000000 could theoretically cause performance degradation.

Recommendation: Add sanity checks or warnings:

if c.Node.PruningInterval == 1 && c.Node.PruningKeepRecent > 10000 {
    return fmt.Errorf("pruning on every block with large retention is not recommended")
}

2. No Access Control on Config Changes

This is an operational concern, not code: ensure only trusted operators can modify pruning configs to prevent accidental data loss.

4. Performance & Resource Efficiency

⚠️ Major Concerns

1. Synchronous Pruning Latency

As mentioned earlier, pruning runs in the critical path. Benchmarking is needed to understand:

How long does pruning 1000 heights take on different storage backends?
What's the worst-case latency impact on block production?

Recommendation: Add benchmark tests:

func BenchmarkPruneBlocks(b *testing.B) {
    // Seed 10000 blocks, measure pruning time for various ranges
}

2. Disk I/O Spikes

Deleting thousands of keys in a single batch can cause I/O spikes. The underlying datastore (LevelDB, BadgerDB, etc.) may compact or reorganize, causing unpredictable latency.

Recommendation: Monitor disk I/O metrics and consider rate-limiting pruning operations.

3. Memory Usage During Batch Building

The batch accumulates deletions for all heights in [lastPruned+1, height]. For large ranges, this could allocate significant memory.

Current Impact: Likely minimal for typical pruning intervals (e.g., pruning every 100 blocks with 1000-block retention). But needs monitoring.

5. Testing & Reliability

✅ Strengths

Comprehensive unit tests: store_test.go has excellent coverage (TestPruneBlocks_RemovesOldBlockDataOnly, TestPruneBlocks_TracksLastPrunedHeight)
EVM store tests: evm/store_test.go validates PruneExecMeta idempotency
Edge case coverage: Tests verify:
- Idempotent pruning (same height twice)
- Missing entries (already pruned data)
- Invalid metadata (graceful degradation)

⚠️ Gaps

1. No Integration Tests

There are no integration tests that:

Verify pruning actually runs during block production
Check that pruned data is inaccessible
Validate that go-header store pruning aligns with ev-node pruning

Recommendation: Add an E2E test that:

Produces 100 blocks with PruningInterval=10, PruningKeepRecent=50
Verifies blocks 1-49 are pruned after block 100
Confirms blocks 50-100 remain accessible

2. No Performance/Load Tests

Missing tests for:

Pruning performance under load
Behavior when pruning large backlogs (e.g., enabling pruning on an archive node)

Recommendation: Add a stress test that simulates enabling pruning on a node with 100k blocks.

3. No Tests for Failure Recovery

What happens if:

Pruning fails halfway through a batch commit?
The node crashes during pruning?

Recommendation: Add tests with mocked batch.Commit() failures to verify recovery behavior.

6. Documentation

✅ Strengths

Clear config comments: config.go:239-244 explains each pruning parameter
Inline comments: Good explanations in store.go:277-282 and executor.go:447-450
PR description: Excellent overview of design trade-offs

⚠️ Improvements Needed

1. Missing Operational Docs

There's no documentation on:

How to enable pruning on an existing node
Performance impact expectations
How to monitor pruning health
What to do if pruning fails

Recommendation: Add a docs/pruning.md file covering operational aspects.

2. Unclear Interaction Between Stores

It's not immediately obvious how the three pruning mechanisms (ev-node store, go-header store, EVM store) interact or if they can get out of sync.

Recommendation: Add a comment in executor.go explaining the coordination:

// Pruning coordination:
// 1. ev-node store prunes /h/, /d/, /c/, /i/ up to targetHeight
// 2. go-header store prunes headers within the time-based window
// 3. EVM store prunes ExecMeta up to targetHeight (if executor implements ExecMetaPruner)
// All three use independent progress tracking, so partial failures are isolated.

3. No Upgrade Path Documentation

What happens when upgrading from a non-pruning node to a pruning-enabled node?

Recommendation: Document:

Backward compatibility (pruning can be enabled/disabled anytime)
Forward compatibility (pruned nodes can't serve full history to peers)

Critical Issues Summary

Priority	Issue	Impact	Recommendation
🔴 High	Synchronous pruning in critical path	Block production delays	Move to background or rate-limit
🟡 Medium	Silent pruning failures	Unbounded disk growth	Add metrics and health checks
🟡 Medium	Missing integration tests	Unknown real-world behavior	Add E2E pruning tests
🟡 Medium	No context cancellation in loops	Slow shutdown during pruning	Add ctx.Done() checks
🟢 Low	go-header time-based window drift	Potential misalignment	Document assumption

Recommendations for Follow-up PRs

Async pruning (High priority): Refactor to run pruning in a background worker
Monitoring (High priority): Add Prometheus metrics for pruning operations
Integration tests (Medium priority): Verify end-to-end pruning behavior
Operational docs (Medium priority): Add pruning guide for operators

Final Verdict

This is a well-engineered feature with solid fundamentals:

✅ Clean architecture with optional interfaces
✅ Idempotent and resumable design
✅ Comprehensive unit test coverage
✅ Proper error handling at the code level

However, the synchronous execution in the critical path is a significant concern that should be addressed before merging to production or documented as a known limitation with plans to refactor.

Approval Status: Approve with recommendations

Safe to merge if synchronous pruning latency is acceptable for expected workloads
Strongly recommend prioritizing async pruning in a follow-up PR
Add integration tests and monitoring before deploying to production

julienrbrt · 2026-01-16T14:19:03Z

pkg/store/store.go

Given we have a lot of different stores and keys. Are we planning to prune as well:

metadatas?

go-header store? (there are settings to be looked at)

execution/evm store (https://github.com/evstack/ev-node/blob/main/execution/evm/store.go)

(and eventually ev-abci store)

It will be inconsistent if we don't allow height pruning of all those stores (at once).

I've updated this PR to prune metadats, go-header store and evm store ✅
I'm happy to work on prunning ev-abci next, but I would probably do it in another PR to prevent this PR from getting too big if that's ok.

github-actions bot assigned pthmas Jan 16, 2026

julienrbrt reviewed Jan 16, 2026

View reviewed changes

pthmas changed the title ~~Block Prunning~~ feat: block Prunning Jan 26, 2026

pthmas changed the title ~~feat: block Prunning~~ feat: block Pruning Jan 26, 2026

pthmas added 6 commits January 26, 2026 15:59

basic prunning

1406a96

prune metadata from ev-node store

2777028

wiring prunning config to go-header

1d44f27

prune evm exec store

2996d98

add parameters validation

961b3f7

make error handling consistent

ed66fe6

pthmas force-pushed the pierrick/prunning branch from 3e1e8e0 to ed66fe6 Compare January 26, 2026 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: block Pruning #2984

feat: block Pruning #2984

Uh oh!

pthmas commented Jan 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

julienrbrt Jan 16, 2026 •

edited

Loading

Uh oh!

pthmas Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: block Pruning #2984

Are you sure you want to change the base?

feat: block Pruning #2984

Uh oh!

Conversation

pthmas commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Config

Design trade-offs

Uh oh!

github-actions bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Block Pruning Implementation

Review Checklist

Summary

1. Code Quality & Architecture

✅ Strengths

⚠️ Areas for Improvement

2. Error Handling & Resilience

✅ Strengths

⚠️ Potential Issues

3. Security

✅ No Major Concerns

⚠️ Minor Considerations

4. Performance & Resource Efficiency

⚠️ Major Concerns

5. Testing & Reliability

✅ Strengths

⚠️ Gaps

6. Documentation

✅ Strengths

⚠️ Improvements Needed

Critical Issues Summary

Recommendations for Follow-up PRs

Final Verdict

Uh oh!

julienrbrt Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pthmas Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pthmas commented Jan 16, 2026 •

edited

Loading

github-actions bot commented Jan 16, 2026 •

edited

Loading

claude bot commented Jan 16, 2026 •

edited

Loading

julienrbrt Jan 16, 2026 •

edited

Loading